Skip to content

feat: add phased decision report to orchestrator#45

Closed
WellDunDun wants to merge 1 commit intodevfrom
WellDunDun/orchestrator-explain
Closed

feat: add phased decision report to orchestrator#45
WellDunDun wants to merge 1 commit intodevfrom
WellDunDun/orchestrator-explain

Conversation

@WellDunDun
Copy link
Collaborator

Replace the bare-bones numeric summary with a structured decision report that explains each orchestration phase (sync, status, decisions, evolve, watch). Operators can now see exactly which skills were considered, which were skipped, and why — along with evolution results and watch alerts.

Changes

  • formatOrchestrateReport(): New pure function builds a 5-phase decision report from OrchestrateResult
  • JSON output: Now includes decisions array with per-skill action, reason, and outcome (deployed/validation/alert details)
  • Human output: Replaces bare summary with full report showing sync sources, status breakdown, per-skill decisions, evolution results, and watch phase
  • Test coverage: 11 new tests covering all report phases

Safe defaults remain intact; no changes to core telemetry semantics.

🤖 Generated with Claude Code

Replace the bare-bones numeric summary with a structured decision report
that explains each orchestration phase (sync, status, decisions, evolve,
watch) so operators can understand what the loop did and why.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 14, 2026

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced CLI orchestration reporting with dual output: structured JSON containing per-skill decisions and results on stdout, and a formatted human-readable decision report on stderr.
  • Tests

    • Added extensive test coverage for report generation, validating formatting across all phases and scenarios.

Walkthrough

This PR adds human-readable report formatting to the orchestration CLI. It introduces helper functions to render phase-specific decision reports and exports a new formatOrchestrateReport function that converts orchestration results to formatted text. CLI output is restructured to emit JSON to stdout and formatted reports to stderr, with comprehensive test coverage validating formatting across various orchestration states.

Changes

Cohort / File(s) Summary
CLI Reporting Functions
cli/selftune/orchestrate.ts
Added formatOrchestrateReport and phase-specific helper functions (formatSyncPhase, formatStatusPhase, formatDecisionPhase, formatEvolutionPhase, formatWatchPhase). Restructured CLI output to emit JSON object with decisions array to stdout and human-readable formatted report to stderr. Replaced inline summary printing.
Test Coverage
tests/orchestrate.test.ts
Added comprehensive tests for formatOrchestrateReport covering dry-run/auto-approve modes, sync source availability, repair info, status breakdown, per-skill decisions, evolution/watch results, and summary counts. Introduced makeOrchestrateResult helper function for test construction.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • selftune#40: Directly modifies the same cli/selftune/orchestrate.ts CLI output path and exports, introducing the JSON stdout + stderr report structure that this PR tests and refines.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title follows conventional commits format with 'feat:' prefix and accurately describes the main change of adding a phased decision report to the orchestrator.
Description check ✅ Passed Description is directly related to the changeset, detailing the new formatOrchestrateReport function, JSON output enhancements, human-readable output changes, and test coverage additions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch WellDunDun/orchestrator-explain
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/selftune/orchestrate.ts`:
- Around line 165-179: In formatWatchPhase, avoid appending empty parentheses
when c.watchResult?.snapshot is missing by building a stats string only when
snapshot exists: compute passInfo and baseInfo from snap, join them into a
non-empty statsPart, and then only append ` (${statsPart})` to the line when
statsPart is truthy; leave alertTag and reason formatting unchanged. Ensure you
reference formatWatchPhase, the watched variable, c.watchResult?.snapshot
(snap), passInfo/baseInfo, and alertTag when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 37e0b1ca-27e1-400d-921e-5fd71478d03b

📥 Commits

Reviewing files that changed from the base of the PR and between aec5f4f and 9a0e025.

📒 Files selected for processing (2)
  • cli/selftune/orchestrate.ts
  • tests/orchestrate.test.ts

Comment on lines +165 to +179
function formatWatchPhase(candidates: SkillAction[]): string[] {
const watched = candidates.filter((c) => c.action === "watch");
if (watched.length === 0) return [];

const lines: string[] = ["Phase 5: Watch"];
for (const c of watched) {
const snap = c.watchResult?.snapshot;
const passInfo = snap ? `pass_rate=${snap.pass_rate.toFixed(2)}` : "";
const baseInfo = snap ? `, baseline=${snap.baseline_pass_rate.toFixed(2)}` : "";
const alertTag = c.watchResult?.alert ? " [ALERT]" : "";
lines.push(` ${c.skill.padEnd(20)} ${c.reason}${alertTag} (${passInfo}${baseInfo})`);
}

return lines;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Empty parentheses appended when snapshot is missing.

If c.watchResult?.snapshot is undefined (snap falsy), passInfo and baseInfo are both empty strings, resulting in output like "SkillName reason ()". Consider omitting the parentheses entirely when there's no snapshot data.

Proposed fix
   const snap = c.watchResult?.snapshot;
   const passInfo = snap ? `pass_rate=${snap.pass_rate.toFixed(2)}` : "";
   const baseInfo = snap ? `, baseline=${snap.baseline_pass_rate.toFixed(2)}` : "";
   const alertTag = c.watchResult?.alert ? " [ALERT]" : "";
-  lines.push(`  ${c.skill.padEnd(20)} ${c.reason}${alertTag} (${passInfo}${baseInfo})`);
+  const metrics = snap ? ` (${passInfo}${baseInfo})` : "";
+  lines.push(`  ${c.skill.padEnd(20)} ${c.reason}${alertTag}${metrics}`);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/selftune/orchestrate.ts` around lines 165 - 179, In formatWatchPhase,
avoid appending empty parentheses when c.watchResult?.snapshot is missing by
building a stats string only when snapshot exists: compute passInfo and baseInfo
from snap, join them into a non-empty statsPart, and then only append `
(${statsPart})` to the line when statsPart is truthy; leave alertTag and reason
formatting unchanged. Ensure you reference formatWatchPhase, the watched
variable, c.watchResult?.snapshot (snap), passInfo/baseInfo, and alertTag when
making the change.

WellDunDun added a commit that referenced this pull request Mar 14, 2026
Orchestrate output now explains each decision clearly so users can trust
the autonomous loop. Adds formatOrchestrateReport() with 5-phase human
report (sync, status, decisions, evolution, watch) and enriched JSON
with per-skill decisions array. Supersedes PR #45.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@WellDunDun
Copy link
Collaborator Author

superseded

@WellDunDun WellDunDun closed this Mar 15, 2026
WellDunDun added a commit that referenced this pull request Mar 15, 2026
* feat: add phased decision report to orchestrator

Orchestrate output now explains each decision clearly so users can trust
the autonomous loop. Adds formatOrchestrateReport() with 5-phase human
report (sync, status, decisions, evolution, watch) and enriched JSON
with per-skill decisions array. Supersedes PR #45.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update orchestrate workflow docs and changelog for decision report

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove redundant null check in formatEvolutionPhase

The filter already guarantees evolveResult is defined; use non-null
assertion instead of a runtime guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add defensive optional chaining for watch snapshot in JSON output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: avoid empty parentheses in watch report when snapshot missing

Consolidates pass_rate and baseline into a single conditional metrics
suffix so lines without a snapshot render cleanly. Addresses CodeRabbit
review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve biome lint and format errors

Replace non-null assertion (!) with type-safe cast to satisfy
noNonNullAssertion rule, and collapse single-arg lines.push to one line
per biome formatter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
WellDunDun added a commit that referenced this pull request Mar 15, 2026
* Add make clean-branches target for repo hygiene

Deletes Conductor worktree branches (custom/prefix/router-*),
selftune evolve test branches, and orphaned worktree-agent-* branches.
Also prunes stale remote tracking refs. Run with `make clean-branches`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add composability v2: synergy detection, sequence extraction, workflow candidates

Extends the composability analysis with positive interaction detection
(synergy scores), ordered skill sequence extraction from usage timestamps,
and automatic workflow candidate flagging. Backwards compatible — v1
function and tests unchanged, CLI falls back to v1 when no usage log exists.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add workflow discovery, SKILL.md writer, and CLI command (v0.3)

Implements multi-skill workflow support: discovers workflow patterns from
existing telemetry, displays them via `selftune workflows`, and codifies
them to SKILL.md via `selftune workflows save`. Includes 48 tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for workflows

* fix: stabilize biome config for CI lint

* fix: address new PR review threads

* Fix all 13 demo findings: grade/evolve/status/doctor/telemetry

BUG-1: Remove false-positive git hook checks, fix hook key names to PascalCase
BUG-2: Auto-derive expectations from SKILL.md when none provided
BUG-3: Add --help output to grade command documenting --session-id
BUG-4: Prefer skills_invoked over skills_triggered in session matching
BUG-5: Add pre-flight validation and human-readable errors to evolve
BUG-6: Distinguish real Skill tool calls from SKILL.md browsing reads
IMP-1: Confirmed templates/ in package.json files array
IMP-2: Auto-install agent files during init
IMP-3: Show UNGRADED instead of CRITICAL when no graded sessions exist
IMP-4: Use portable npx selftune hook <name> instead of absolute paths
IMP-5: Add selftune auto-grade command
IMP-6: Mandate AskUserQuestion in evolve workflows
IMP-7: Add selftune quickstart command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix hook subcommand to spawn hooks as subprocesses

Hook files guard execution behind import.meta.main, so dynamically
importing them was a no-op. Spawn as subprocess instead so stdin
payloads are processed and hooks write telemetry logs correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Address review comments

* Fix lint CI failures

* Trigger CI rerun

* Fix lint command resolution

* Address remaining review comments

* Fix grade-session test isolation

* docs: Align selftune skill docs with shipped workflows (#31)

* Update selftune workflow docs and skill versioning

* Improve selftune skill portability and setup docs

* Clarify workflow doc edge cases

* Fix OpenClaw doctor validation and workflow docs

* Polish composability and setup docs

* Fix BUG-7, BUG-8, BUG-9 from demo findings (#32)

* Fix BUG-7, BUG-8, BUG-9 from demo findings

BUG-7: Add try/catch + array validation around eval-set file loading in
evolve() so parse errors surface as user-facing messages instead of
silent exit.

BUG-8: Add cold-start bootstrap — when extractFailurePatterns returns
empty but the eval set has positive entries, treat those positives as
missed queries so evolve can work on skills with zero usage history.

BUG-9: Add --out flag to evals CLI parseArgs as alias for --output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix evolve CI regressions

* Isolate blog proof fixture mutations

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Fix dashboard data and extract telemetry contract (#33)

* Fix dashboard export and layout

* Improve telemetry normalization groundwork

* Add test runner state

* Separate and extract telemetry contract

* Fix telemetry CI lint issues

* Fix remaining CI regressions

* Detect zero-trigger monitoring regressions

* Stabilize dashboard report route tests

* Address telemetry review feedback

* Fix telemetry normalization edge cases (#34)

* Fix telemetry follow-up edge cases

* Fix rollback payload and Codex prompt attribution

* Tighten Codex rollout prompt tracking

* Update npm package metadata (#35)

* Prepare 0.2.1 release (#36)

* Prepare 0.2.1 release

* Update README install path

* Use trusted publishing for npm

* feat: harden LLM calls and fix test failures (#38)

* feat: consume @selftune/telemetry-contract as workspace package

Replace relative path imports of telemetry-contract with the published
@selftune/telemetry-contract workspace package. Adds workspace config to
package.json and expands tsconfig includes to cover packages/*.

Closes SEL-10

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(telemetry-contract): add versioning, metadata, and golden fixtures

Add version 1.0.0 and package metadata (description, author, license,
repository) to the telemetry-contract package. Create golden fixture file
with one valid example per record kind and a test suite that validates
all fixtures against the contract validator.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Make selftune source-truth driven

* Harden live dashboard loading

* Audit cleanup: test split, docs, lint fixes

- Add make test-fast / test-slow targets (5s vs 80s, 16x faster dev loop)
- Add bun run test:fast / test:slow scripts in package.json
- Reposition README as "Claude Code first", update competitive comparison
- Bump PRD.md version to 0.2.1
- Add CHANGELOG unreleased section (source-truth, telemetry-contract, test split)
- Fix pre-existing lint: types.ts formatting, golden.test.ts import order

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add sync flags and hook dispatch to integration guide

- Document all selftune sync flags (--since, --dry-run, --force, etc.)
- Add selftune hook dispatch command with all 6 hook names
- Verified init, activation rules, and source-truth sections already current

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Harden LLM calls and fix pre-existing test failures

Add exponential backoff retry to callViaAgent for transient subprocess
failures. Cap JSONL health-check validation at 500 lines to prevent
timeouts on large log files. Use exported DEFAULT_WINDOW_SESSIONS
constant in dashboard data collection instead of telemetry.length.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add local SQLite materialization layer for dashboard (#42)

* feat: add SQLite materialization layer for dashboard queries

Add a local SQLite database (via bun:sqlite) as an indexed materialized
view store so the dashboard/report UX no longer depends on recomputing
everything from raw JSONL logs on every request.

New module at cli/selftune/localdb/ with:
- schema.ts: 10 tables + 19 indexes mirroring canonical telemetry and
  local log shapes
- db.ts: openDb() lifecycle with WAL mode, meta key-value helpers
- materialize.ts: full rebuild and incremental materialization from
  JSONL source-of-truth logs
- queries.ts: getOverviewPayload(), getSkillReportPayload(),
  getSkillsList() query helpers

Raw JSONL logs remain authoritative — the DB is a disposable cache that
can always be rebuilt. No new npm dependencies (bun:sqlite only).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve Biome lint and format errors

Auto-fix import ordering, formatting, and replace non-null assertions
with optional chaining in tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: implement autonomous selftune orchestrator core loop (#40)

* feat: add selftune orchestrate command for autonomous core loop

Introduces `selftune orchestrate` — a single entry point that chains
sync → status → evolve → watch into one coordinated run. Defaults to
dry-run mode with explicit --auto-approve for deployments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — lint errors, logic bug, and type completeness

- Replace string concatenation with template literals (Biome lint)
- Add guard in evolve loop for agent-missing skip mutations
- Replace non-null assertion with `as string` cast
- Remove unused EvolutionAuditEntry import
- Complete DoctorResult mock with required fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply Biome formatting and import sorting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Build(deps): Bump oven-sh/setup-bun from 2.1.2 to 2.1.3 (#26)

Bumps [oven-sh/setup-bun](https://github.com/oven-sh/setup-bun) from 2.1.2 to 2.1.3.
- [Release notes](https://github.com/oven-sh/setup-bun/releases)
- [Commits](oven-sh/setup-bun@3d26778...ecf28dd)

---
updated-dependencies:
- dependency-name: oven-sh/setup-bun
  dependency-version: 2.1.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Build(deps): Bump actions/setup-node from 6.2.0 to 6.3.0 (#27)

Bumps [actions/setup-node](https://github.com/actions/setup-node) from 6.2.0 to 6.3.0.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](actions/setup-node@6044e13...53b8394)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-version: 6.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Build(deps): Bump github/codeql-action from 4.32.4 to 4.32.6 (#28)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.32.4 to 4.32.6.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@89a39a4...0d579ff)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.32.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Improve sync progress and tighten query filtering (#43)

* Improve sync progress and tighten query filtering

* Fix biome formatting errors in sync.ts and query-filter.test.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: generic scheduling and reposition OpenClaw cron as optional (#41)

* feat: add generic scheduling command and reposition OpenClaw cron as optional

The primary automation story is now agent-agnostic. `selftune schedule`
generates ready-to-use snippets for system cron, macOS launchd, and Linux
systemd timers. `selftune cron` is repositioned as an optional OpenClaw
integration rather than the main automation path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review — centralize schedule data, fix generators and formatting

Derive SCHEDULE_ENTRIES from DEFAULT_CRON_JOBS (single source of truth),
generate launchd/systemd configs for all 4 entries instead of sync-only,
fix biome formatting, and add markdown language tag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use StartCalendarInterval for fixed-time launchd and shell wrappers for chained commands

- launchd: use StartCalendarInterval (Hour/Minute/Weekday) for fixed-time
  schedules instead of approximating with StartInterval
- launchd/systemd: use /bin/sh -c wrapper for commands with && chains
  so prerequisite steps (like sync) are not silently dropped

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add local dashboard SPA with React + Vite (#39)

* feat: add local dashboard SPA with React + Vite

Introduces a minimal React SPA at apps/local-dashboard/ with two routes:
overview (KPIs, skill health grid, evolution feed) and per-skill drilldown
(pass rate, invocation breakdown, evaluation records). Consumes existing
dashboard-server API endpoints with SSE live updates, explicit loading/
error/empty states, and design tokens matching the current dashboard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review feedback for local dashboard

Extract shared utils (deriveStatus, formatRate, timeAgo), add SSE exponential
backoff with max retries, filter ungraded skills from avg pass rate, fix stuck
loading state for undefined skillName, use word-boundary regex for evolution
filtering, add focus-visible styles, add typecheck script, and add Vite env
types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address second round CodeRabbit review feedback

Cancel pending SSE reconnect timers on cleanup, add stale-request guard
to useSkillReport, remove redundant decodeURIComponent (React Router
already decodes), quote font names in CSS for stylelint, and format
deriveStatus signature for Biome.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: align local dashboard SPA with SQLite v2 data architecture

Migrate SPA from old JSONL-reading /api/data endpoints to new
SQLite-backed /api/v2/* endpoints. Add v2 server routes for overview
and per-skill reports. Replace SSE with 15s polling. Rewrite types
to match materialized query shapes from queries.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review feedback on local dashboard SPA

- Add language identifier to HANDOFF.md fenced code block (MD040)
- Prevent overlapping polls in useOverview with in-flight guard and sequential setTimeout
- Broaden empty-state check in useSkillReport to include evolution/proposals
- Fix Sessions KPI to use counts.sessions instead of counts.telemetry
- Wrap materializeIncremental in try/catch to preserve last good snapshot on failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: sort imports to satisfy Biome organizeImports lint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit nitpicks — cross-platform dev script, stricter types, CSS compat

- Use concurrently for cross-platform dev script instead of shell backgrounding
- Tighten Sidebar counts prop to Partial<Record<SkillHealthStatus, number>>
- Replace color-mix() with rgba fallback for broader browser support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add UNKNOWN status filter and extract header height CSS variable

- Add UNKNOWN to STATUS_OPTIONS so all SkillHealthStatus values are filterable
- Extract hardcoded 56px header height to --header-h CSS variable

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: hoist sidebar collapse state to layout and add UNKNOWN filter style

- Lift collapsed state from Sidebar to Overview so grid columns resize properly
- Add .sidebar-collapsed grid rules at all breakpoints
- Fix mobile: collapsed sidebar no longer creates dead-end (shows inline)
- Add .filter-pill.active.filter-unknown CSS rule for UNKNOWN status

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: serve SPA as default dashboard, legacy at /legacy/

- Dashboard server now serves built SPA from apps/local-dashboard/dist/ at /
- Legacy dashboard moved to /legacy/ route
- SPA fallback for client-side routes (e.g. /skills/:name)
- Static asset serving with content-hashed caching for /assets/*
- Path traversal protection on static file serving
- Add build:dashboard script to root package.json
- Include apps/local-dashboard/dist/ in published files
- Falls back to legacy dashboard if SPA build not found

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add shadcn theming with dark/light toggle and selftune branding

Migrate dashboard to shadcn theme system with proper light/dark support.
Dark mode uses selftune site colors (navy/cream/copper), light mode uses
standard shadcn defaults. Add ThemeProvider with localStorage persistence,
sun/moon toggle in site header, and SVG logo with currentColor for both themes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: path traversal check and 404 for missing skills

Use path.relative() + isAbsolute() instead of startsWith() for the SPA
static asset path check to prevent directory traversal bypass. Return 404
from /api/v2/skills/:name when the skill has no usage data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome formatting — semicolons, import order, line length

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review — dedupe polling, fix stale closures, harden theme/config

- Lift useOverview to DashboardShell, pass as prop to Overview (no double polling)
- Fix stale closure in drag handler by deriving indices from prev state
- Validate localStorage theme values, use undefined context default
- Add relative positioning to theme toggle button for MoonIcon overlay
- Fix falsy check hiding zero values in chart tooltip
- Fix invalid Tailwind selectors in dropdown-menu and toggle-group
- Use ESM-safe fileURLToPath instead of __dirname in vite.config
- Switch manualChunks to function form for Base UI subpath matching
- Align pass-rate threshold with deriveStatus in SkillReport
- Use local theme provider in sonner instead of next-themes
- Add missing React import in skeleton, remove unused Separator import
- Include vite.config.ts in tsconfig for typecheck coverage
- Fix inconsistent JSX formatting in select scroll buttons

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review round 2 — shared sorting, DnD fixes, Tailwind v4 migration

- Extract sortByPassRateAndChecks to utils.ts, dedupe sorting in App + Overview
- Derive DnD dataIds from row model (not raw data), guard against -1 indexOf
- Hide pagination when table is empty instead of showing "Page 1 of 0"
- Fix ActivityTimeline default tab to prefer non-empty dataset
- Import ReactNode directly instead of undeclared React namespace
- Quote CSS attribute selector in chart style injection
- Use stable composite keys for tooltip and legend items
- Remove unnecessary "use client" directive from dropdown-menu (Vite SPA)
- Migrate outline-none to outline-hidden for Tailwind v4 accessibility
- Fix toggle-group orientation selectors to match data-orientation attribute
- Add missing CSSProperties import in sonner.tsx
- Add dark mode variant for SkillReport row highlight
- Format vite.config.ts with Biome

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add evidence viewer, evolution timeline, and enhanced skill report

Add EvidenceViewer, EvolutionTimeline, and InfoTip components. Enhance
SkillReport with richer data display, expand dashboard server API
endpoints, and update documentation and architecture docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review round 3 — DnD/sort conflict, theme listener, formatting

- Disable DnD reorder when table sorting is active (skill-health-grid)
- Listen for OS theme preference changes when system theme is active
- Apply Biome formatting to sortByPassRateAndChecks
- Remove unused useEffect import from Overview
- Deduplicate confidence filter in SkillReport
- Materialize session IDs once in dashboard-server to avoid repeated subqueries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: show selftune version in sidebar footer

Pass version from API response through to AppSidebar and display
it dynamically instead of hardcoded "dashboard v0.1".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome formatting in dashboard-server — line length wrapping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review round 4 — dedupe formatRate, STATUS_CONFIG, cleanup

- Remove duplicate formatRate from app-sidebar, import from @/utils
- Extract STATUS_CONFIG to shared @/constants module, import in both
  skill-health-grid and SkillReport
- Remove misleading '' fallback from sessionPlaceholders since the
  ternary guards already skip queries when empty

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove redundant items prop from Select to avoid duplication

The SelectItem children already define the options; the items prop
was duplicating them unnecessarily.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add sortableKeyboardCoordinates to KeyboardSensor for proper keyboard DnD

Without this, keyboard navigation moves by pixels instead of jumping
between sortable items.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Linear-style dashboard UX — collapsible sidebar, direct skill links, scope grouping

- Simplify sidebar: remove status filters, keep logo + search + skills list
- Add collapsible scope groups (Project/Global) using base-ui Collapsible
- Surface skill_scope from DB query through API to dashboard types
- Replace skill drawer with direct Link navigation to skill report
- Add Scope column to skills table with filter dropdown
- Slim down site header: remove breadcrumbs, reduce to sidebar trigger + theme toggle
- Add side-by-side grid layout: skills table left, activity panel right
- Gitignore pnpm-lock.yaml alongside bun.lock

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review — accessibility, semantics, state reset

- Remove bun.lock from .gitignore to maintain build reproducibility
- Preserve unexpected scope values in sidebar (don't drop unrecognized scopes)
- Add aria-label to skill search input for screen reader accessibility
- Switch status filter from checkbox to radio-group semantics (mutually exclusive)
- Reset selectedProposal when navigating between skills via useEffect on name

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add TanStack Query and optimize SQL queries for dashboard performance

Migrate data fetching from manual polling/dedup hooks to TanStack Query
for instant cached navigation, background refetch, and request dedup.
Optimize SQL: replace NOT IN subqueries with LEFT JOIN, move JS dedup
to GROUP BY, add LIMIT 200 to unbounded evidence queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: track root bun.lock for reproducible installs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review — collapsible sync, drag handle dedup, a11y, not-found heuristic

- Make sidebar Collapsible controlled so it auto-opens when active skill
  changes (Comment #1)
- Consolidate useSortable to single call per row via React context,
  use setActivatorNodeRef on drag handle button (Comment #2)
- Remove capitalize CSS transform on free-form scope values (Comment #3)
- Broaden isNotFound heuristic to check invocations, prompts, sessions
  in addition to evals/evolution/proposals (Comment #4)
- Move Tooltip outside TabsTrigger to avoid nested interactive elements,
  use Base UI render prop for composition (Comment #5)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit nitpicks — version pinning, changelog clarity, shared query helper

- Use caret range for recharts version (^2.15.4) for consistency
- Clarify changelog: SSE was removed, polling via refetchInterval is primary
- Extract getPendingProposals() shared helper in queries.ts, used by both
  getOverviewPayload() and dashboard-server skill report endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit round 3 — deps, async fs, type safety, deterministic query

- Move @tailwindcss/vite, tailwindcss, shadcn to devDependencies
- Fix trailing space in version display when version is empty
- Type caught error as unknown in refreshV2Data
- Replace sync fs (readFileSync/statSync) with Bun.file() for hot-path asset serving
- Return 404 for missing /assets/* files instead of falling through to SPA
- Add details and eval_set fields to SkillReportPayload.evidence type
- Fix nondeterministic GROUP BY with ROW_NUMBER() CTE in getPendingProposals

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve Biome lint and format errors in CI

- Replace non-null assertion with type cast in useSkillReport (noNonNullAssertion)
- Break long import line in dashboard-server.ts to satisfy Biome formatter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit round 4 — CTE subqueries, type alignment, scope index

- Replace dynamic bind-parameter expansion with CTE subquery for session lookups
- Add skill_name to OverviewPayload.pending_proposals type to match runtime shape
- Add composite index on skill_usage(skill_name, skill_scope, timestamp) for scope lookups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit round 5 — startup guard, 404 heuristic, deterministic tiebreaker

- Guard initial v2 materialization with try/catch to avoid full server crash
- Include evidence in not-found check so evidence-only skills aren't 404'd
- Add ea.id DESC tiebreaker to ROW_NUMBER() for deterministic pending proposals

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit round 6 — db guard, refresh throttle, deferred 404

- Guard openDb() in try/catch so DB bootstrap failure doesn't crash server
- Make db nullable, return 503 from /api/v2/* when store is unavailable
- Throttle failed refresh attempts with separate lastV2RefreshAttemptAt timestamp
- Move skill 404 check after enrichment queries (evolution, proposals, invocations)
- Use optional chaining for db.close() on shutdown

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Prepare SPA dashboard release path (#44)

* Promote product planning docs

* Add execution plans for product gaps and evals

* Prepare SPA dashboard release path

* Remove legacy dashboard runtime

* Refresh execution plans after dashboard cutover

* Build dashboard SPA in CI and publish

* Refresh README for SPA release path

* Address dashboard release review comments

* Fix biome lint errors in dashboard tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Make autonomous loop the default scheduler path

* Document orchestrate as the autonomous loop

* Document autonomy-first setup path

* Harden autonomous scheduler install paths

* Clarify sync force usage in README

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: phased decision report for orchestrator explainability (#48)

* feat: add phased decision report to orchestrator

Orchestrate output now explains each decision clearly so users can trust
the autonomous loop. Adds formatOrchestrateReport() with 5-phase human
report (sync, status, decisions, evolution, watch) and enriched JSON
with per-skill decisions array. Supersedes PR #45.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update orchestrate workflow docs and changelog for decision report

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove redundant null check in formatEvolutionPhase

The filter already guarantees evolveResult is defined; use non-null
assertion instead of a runtime guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add defensive optional chaining for watch snapshot in JSON output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: avoid empty parentheses in watch report when snapshot missing

Consolidates pass_rate and baseline into a single conditional metrics
suffix so lines without a snapshot render cleanly. Addresses CodeRabbit
review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve biome lint and format errors

Replace non-null assertion (!) with type-safe cast to satisfy
noNonNullAssertion rule, and collapse single-arg lines.push to one line
per biome formatter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: evidence-based candidate selection gating (#50)

* feat: evidence-based candidate selection with cooldown, evidence, and trend gates

Add four gating rules to selectCandidates so autonomous evolution acts on
stronger signals and skips noisy/premature candidates:

- Cooldown gate: skip skills deployed within 24h
- Evidence gate: require 3+ skill_checks for CRITICAL/WARNING
- Weak-signal filter: skip WARNING with 0 missed queries + non-declining trend
- Trend boost: declining skills prioritized higher in sort order

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use epoch-ms comparison for timestamp gating, use cooldown constant in test

Fixes two CodeRabbit review issues:
- Timestamp comparisons in findRecentlyDeployedSkills and
  findRecentlyEvolvedSkills now use Date.parse + epoch-ms instead of
  lexicographic string comparison, which breaks on non-UTC offsets
- Test derives oldTimestamp from DEFAULT_COOLDOWN_HOURS instead of
  hardcoding 48, fixing the unused import lint error

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix import formatting in orchestrate test to satisfy CI

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply biome formatting to orchestrate and tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* E2E autonomy proof harness for evolution pipeline (#51)

* feat: add e2e autonomy proof harness for evolution pipeline

Proves three core autonomous evolution claims with 8 deterministic tests:
- Autonomous deploy: orchestrate selects WARNING skill, evolve deploys real SKILL.md
- Regression detection: watch fires alert when pass rate drops below baseline
- Auto-rollback: deploy→regression→rollback restores original file from backup

Uses dependency injection to skip LLM calls while exercising real file I/O
(deployProposal writes, rollback restores, audit trail persists).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve Biome lint errors in autonomy-proof test

Sort imports, fix formatting, remove unused imports, replace non-null
assertions with optional chaining.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: persist orchestrate run reports in dashboard (#49)

* feat: persist orchestrate run reports and expose in dashboard SPA

Orchestrate now writes a structured run report (JSONL) after each run,
materialized into SQLite for the dashboard. A new "Orchestrate Runs"
panel on the Overview page lets users inspect what selftune did, why
skills were selected/skipped/deployed, and review autonomous decisions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review findings

- Handle rolledBack state in OrchestrateRunsPanel badge rendering
- Show loading/error states instead of false empty state for orchestrate runs
- Move ORCHESTRATE_RUN_LOG to LOG_DIR (~/.claude) per log-path convention
- Validate limit param with 400 error for non-numeric input
- Derive run report counts from final candidates instead of stale summary
- Include error message in appendJsonl catch for diagnosability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update autonomy-proof test fixtures for new candidate selection gates

After merging dev, selectCandidates gained cooldown, evidence, and
weak-signal gates. The test fixtures used snapshot: null and
trend: "declining", which caused skills to be skipped by the
insufficient-evidence gate and missed the renamed trend value "down".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: unify summary totals to prevent CLI/dashboard metric drift

Both result.summary and runReport now derive from a single
finalTotals object computed from the final candidates array,
eliminating the possibility of divergent counts between CLI
output and persisted dashboard data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: demo-ready CLI consolidation + autonomous orchestration (#52)

* Refresh architecture and operator docs

* docs: align docs and skill workflows with autonomy-first operator path

Reframes operator guide around autonomy-first setup, adds orchestrate runs
endpoint to architecture/dashboard docs, and updates skill workflows to
recommend --enable-autonomy as the default initialization path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: autoresearch-inspired UX improvements for demo readiness

- orchestrate --loop: continuous autonomous improvement cycle with configurable interval
- evolve: default cheap-loop on, add --full-model escape hatch, show diff after deploy
- bare `selftune` shows status dashboard instead of help text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: show selftune resource usage instead of session-level metrics in skill report

Skill report cards now display selftune's own LLM calls and evolution
duration per skill (from orchestrate_runs) instead of misleading
session-level token/duration aggregates. Also extracts tokens and
duration from transcripts into canonical execution facts for future use.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: consolidate CLI from 28 flat commands to 21 grouped commands

Group 15 related commands under 4 parent commands:
- selftune ingest <agent> (claude, codex, opencode, openclaw, wrap-codex)
- selftune grade [mode] (auto, baseline)
- selftune evolve [target] (body, rollback)
- selftune eval <action> (generate, unit-test, import, composability)

Update all 39 files: router, subcommand help text, SKILL.md, workflow
docs, design docs, README, PRD, CHANGELOG, and agent configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit PR review comments

- Filter skip/watch actions from selftune_stats run counts
- Restore legacy token_usage/duration_stats from execution_facts
- Cooperative SIGINT/SIGTERM shutdown for orchestrate loop
- Validate --window as positive integer with error message
- Add process.exit guard for bare selftune status fallthrough
- Update ARCHITECTURE.md import matrix for Dashboard dependencies
- Fix adapter count, code fence languages, and doc terminology

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining review comments and stale replay references

- Escape SQL LIKE wildcards in dashboard skill name query
- Add Audit + Rollback steps to SKILL.md feedback loop
- Fix stale "replay" references in quickstart help text and quickstart.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CI lint failures

- Fix dashboard-server.ts indentation on LIKE escape pattern
- Prefix unused deployedCount/watchedCount with underscore
- Format api.ts import to multi-line per biome rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: escape backslash in SQL LIKE pattern to satisfy CodeQL

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add system status page to dashboard with doctor diagnostics

Surfaces the doctor health checks (config, log files, hooks, evolution)
through a new /status route in the dashboard SPA, so humans can monitor
selftune health without touching the CLI.

- Add GET /api/v2/doctor endpoint to dashboard server
- Add DoctorResult/HealthCheck types to dashboard contract
- Create Status page with grouped checks, summary cards, auto-refresh
- Add System Status link in sidebar footer
- Update all related docs (ARCHITECTURE, HANDOFF, system-overview, Dashboard workflow)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add ESCAPE clause to LIKE query and fix stale replay label

- SQLite LIKE needs explicit ESCAPE '\\' for backslash escapes to work
- Rename "Replay failed" to "Ingest failed" in quickstart error output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review comments on status page PR

- Add "Other" fallback group for unknown check types in Status page
- Use compound key (name+idx) to avoid React key collisions
- Re-export DoctorResult types from types.ts instead of duplicating
- Fix orchestrate loop sleep deadlock on SIGINT/SIGTERM
- Replace stale SSE references with polling-based refresh in Dashboard docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: establish agent-first architecture principle across repo

selftune is a skill consumed by agents, not a CLI tool for humans.
Users install the skill and talk to their agent ("improve my skills"),
the agent reads SKILL.md, routes to workflows, and runs CLI commands.

- AGENTS.md: add Agent-First Architecture section + dev guidance
- ARCHITECTURE.md: add Agent-First Design Principle at top
- SKILL.md: add agent-addressing preamble ("You are the operator")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: demo-ready P0 fixes from architecture audit

Four parallel agent implementations:

1. SKILL.md trigger keywords: added natural-language triggers across 10
   workflows + 13 new user-facing examples ("set up selftune", "improve
   my skills", "how are my skills doing", etc.)

2. Hook auto-merge: selftune init now automatically merges hooks into
   ~/.claude/settings.json for Claude Code — no manual settings editing.
   Initialize.md updated to reflect auto-install.

3. Cold-start fallback: quickstart detects empty telemetry after ingest
   and shows hook-discovered skills or guidance message instead of blank
   output. No LLM calls, purely data-driven.

4. Dashboard build: added prepublishOnly script to ensure SPA is built
   before npm publish (CI already did this, but local publish was not).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: defensive checks fallback and clarify reserved counters

- Status.tsx: default checks to [] if API returns undefined
- orchestrate.ts: annotate _deployedCount/_watchedCount as reserved

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: prioritize Claude Code, unify cron/schedule, remove dead code

- Mark Codex/OpenCode/OpenClaw as experimental across docs, SKILL.md,
  CLI help text, and README. Claude Code is the primary platform.
- Unify cron and schedule into `selftune cron` with --platform flag
  for agent-specific setup. `selftune schedule` kept as alias.
- Remove dead _deployedCount/_watchedCount counters from orchestrate.ts
  (summary already computed via array filters in Step 7).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: document two operating modes and data architecture

- ARCHITECTURE.md: add Interactive vs Automated mode explanation,
  document JSONL-first data flow with SQLite as materialized view
- Cron.md: fix stale orchestrate schedule (weekly → every 6 hours),
  correct "agent runs" to "OS scheduler calls CLI directly"
- Orchestrate.md: add execution context table (interactive vs automated)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite all 22 workflow docs for agent-first consistency

Three parallel agents rewrote workflow docs so the agent (not the human)
is the operator:

Critical (3 files): Evolve.md, Evals.md, Baseline.md
- Pre-flight sections now have explicit selection-to-flag mapping tables
- Agent knows exactly how to parse user choices into CLI commands

Moderate (11 files): Initialize, Dashboard, Watch, Grade, Contribute,
UnitTest, Sync, AutoActivation, Orchestrate, Doctor, Replay
- "When to Use" sections rewritten as agent trigger conditions
- "Common Patterns" converted from user quotes to agent decision logic
- Steps use imperative agent voice throughout
- Replay.md renamed to "Ingest (Claude) Workflow" with compatibility note

Minor (8 files): Composability, Schedule, Cron, Badge, Workflows,
EvolutionMemory, ImportSkillsBench, Ingest
- Added missing "When to Use" sections
- Added error handling guidance
- Fixed agent voice consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add autonomous mode + connect agents to workflows

Autonomous mode:
- Evolve, Watch, Grade, Orchestrate workflows now document their
  behavior when called by selftune orchestrate (no user interaction,
  defaults used, pre-flight skipped, auto-rollback enabled)
- SKILL.md routing table marks autonomous workflows with †

Agent connections:
- All 4 agents (.claude/agents/) now have "Connection to Workflows"
  sections explaining when the main agent should spawn them
- Key workflows (Evolve, Doctor, Composability, Initialize) now have
  "Subagent Escalation" sections referencing the relevant agent
- SKILL.md agents table adds "When to spawn" column with triggers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove duplicate findRecentlyEvolvedSkills function

findRecentlyEvolvedSkills was identical to findRecentlyDeployedSkills.
Consolidated into one function used for both cooldown gating and
watch targeting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 21 CodeRabbit review comments

AGENTS.md:
- Add missing hook files and openclaw-ingest to project tree
- Use selftune cron as canonical scheduling command

Status.tsx:
- Add aria-label and title to refresh button for accessibility

ARCHITECTURE.md:
- Use canonical JSONL filenames matching constants.ts
- Add text language specifier to code block

index.ts:
- Add --help handlers for grade and evolve grouped commands
- Add --help handler for eval composability before parseArgs

quickstart.ts:
- Fix stale "Replay" comment to "Ingest"

Workflow docs:
- Cron.md: fix --format to --platform, add text fence, add --skill-path
- Evals.md: fix HTML entities to literal angle brackets
- Evolve.md: replace placeholder with actual --pareto flag
- Grade.md: clarify results come from grading.json not stdout
- Ingest.md: fix wrap-codex error guidance (no --verbose flag)
- Initialize.md: use full selftune command form, fix relative path
- Orchestrate.md: fix token cost contradiction, document --loop mode
- Sync.md: clarify synced=0 is valid, fix output parsing guidance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: real-time improvement signal detection and reactive orchestration

Core feature: selftune now detects when users correct skill misses
("why didn't you use X?", "please use the commit skill") and triggers
focused improvement automatically when the session ends.

Signal detection (prompt-log.ts):
- Pure regex patterns detect corrections, explicit requests
- Extracts mentioned skill name from query text
- Appends to improvement_signals.jsonl (zero LLM cost)

Reactive trigger (session-stop.ts):
- Checks for pending signals when session ends
- Spawns background selftune orchestrate if signals exist
- Lockfile prevents concurrent runs (30-min stale threshold)

Signal-aware orchestrator (orchestrate.ts):
- Reads pending signals at startup (no new CLI flags)
- Boosts priority of signaled skills (+150 per signal, cap 450)
- Signaled skills bypass evidence and UNGRADED gates
- Marks signals consumed after run completes
- Lockfile acquire/release wrapping full orchestrate body

Tests: 32 new tests across 2 files (signal detection + orchestrator)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: document signal-reactive improvement across architecture docs

- ARCHITECTURE.md: add Signal-Reactive Improvement section with mermaid
  sequence diagram showing signal flow from prompt-log to orchestrate
- Orchestrate.md: add Signal-Reactive Trigger section with guard rails
- evolution-pipeline.md: add signal detection as pipeline input
- system-overview.md: add signal-reactive path to system overview
- logs.md: document improvement_signals.jsonl format and fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 14 CodeRabbit review comments (round 4)

Agents:
- diagnosis-analyst: resolve activation contradiction (subagent-only)
- evolution-reviewer: inspect recorded eval file, not regenerated one
- integration-guide: carry --skill flag through to evolve command

Dashboard:
- Status.tsx: defensive fallback for unknown health status values

CLI:
- index.ts: remove redundant process.exit(0) after statusMain
- index.ts: strict regex validation for --window (reject "10days")

Quickstart:
- Remove misleading [2/3] prefix from post-step check

Workflows:
- SKILL.md: add text language specifier to feedback loop diagram
- Initialize.md: add blank line + text specifier to code block
- Orchestrate.md: fix sync step to use selftune sync not ingest claude
- Doctor.md: route missing-telemetry fixes by agent platform
- Evals.md: add skill-path to synthetic pre-flight, note haiku alias
- Ingest.md: wrap-codex uses wrapper not hooks for telemetry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: dependency map, README refresh, dashboard signal exec plan

1. AGENTS.md: Add Change Propagation Map — "if you change X, update Y"
   table that agents check before committing. Prevents stale docs.

2. README.md: Refresh for v0.2 architecture:
   - Agent-first framing ("tell your agent" not "run this command")
   - Grouped commands table (ingest, grade, evolve, eval, auto)
   - Signal-reactive detection mentioned in Detect section
   - Automate section with selftune cron setup
   - Removed CLI-centric use case descriptions

3. Exec plan for dashboard signal integration (planned, not started):
   - Schema + materialization + queries + contract + API + UI
   - 3 parallel agent workstreams, ~4.5 hours estimated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: migrate repo URLs from WellDunDun to selftune-dev org

Update all repo path references (badges, clone URLs, install command,
contribute PR target, security tab link, llms.txt) from personal
WellDunDun/selftune to org selftune-dev/selftune.

Kept as WellDunDun (personal account, not repo path):
- CODEOWNERS (@WellDunDun)
- FUNDING.yml (sponsors/WellDunDun)
- LICENSE copyright
- PRD owner field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add repo org/name migration to change propagation map

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 13 CodeRabbit review comments (round 5)

Code fixes:
- prompt-log.ts: broaden skill capture regex to [\w-]+ for hyphenated names
- session-stop.ts: atomic lock acquisition with openSync("wx") + cleanup
- orchestrate.ts: re-read signal log before write to prevent race condition

Dashboard:
- Status.tsx: defensive defaults for healthy, summary, timestamp

Docs:
- ARCHITECTURE.md: use selftune cron setup as canonical scheduler
- Orchestrate.md: fix loop-interval default (3600s not 300s)
- Evals.md: fix option numbering (1-5) and selection mapping (4a/4b/4c)
- Initialize.md: use selftune init --force instead of repo-relative paths
- logs.md: document signal consumption as exception to append-only

Tests:
- signal-detection: fix vacuous unknown-skill test
- signal-orchestrate: exercise missing-log branch with non-empty signals

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: llms.txt branch-agnostic links, README experimental clarity

- llms.txt: /blob/master/ → /blob/HEAD/ for branch-agnostic URLs
- README line 28: clarify Claude Code primary, others experimental
- README line 38: remove redundant "Within minutes"
- README footer: match experimental language from Platforms section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant